harmful stereotype
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
Apertus, Project, Hernández-Cano, Alejandro, Hägele, Alexander, Huang, Allen Hao, Romanou, Angelika, Solergibert, Antoni-Joan, Pasztor, Barna, Messmer, Bettina, Garbaya, Dhia, Ďurech, Eduard Frank, Hakimi, Ido, Giraldo, Juan García, Ismayilzada, Mete, Foroutan, Negar, Moalla, Skander, Chen, Tiancheng, Sabolčec, Vinko, Xu, Yixuan, Aerni, Michael, AlKhamissi, Badr, Mariñas, Inés Altemir, Amani, Mohammad Hossein, Ansaripour, Matin, Badanin, Ilia, Benoit, Harold, Boros, Emanuela, Browning, Nicholas, Bösch, Fabian, Böther, Maximilian, Canova, Niklas, Challier, Camille, Charmillot, Clement, Coles, Jonathan, Deriu, Jan, Devos, Arnout, Drescher, Lukas, Dzenhaliou, Daniil, Ehrmann, Maud, Fan, Dongyang, Fan, Simin, Gao, Silin, Gila, Miguel, Grandury, María, Hashemi, Diba, Hoyle, Alexander, Jiang, Jiaming, Klein, Mark, Kucharavy, Andrei, Kucherenko, Anastasiia, Lübeck, Frederike, Machacek, Roman, Manitaras, Theofilos, Marfurt, Andreas, Matoba, Kyle, Matrenok, Simon, Mendonça, Henrique, Mohamed, Fawzi Roberto, Montariol, Syrielle, Mouchel, Luca, Najem-Meyer, Sven, Ni, Jingwei, Oliva, Gennaro, Pagliardini, Matteo, Palme, Elia, Panferov, Andrei, Paoletti, Léo, Passerini, Marco, Pavlov, Ivan, Poiroux, Auguste, Ponkshe, Kaustubh, Ranchin, Nathan, Rando, Javi, Sauser, Mathieu, Saydaliev, Jakhongir, Sayfiddinov, Muhammad Ali, Schneider, Marian, Schuppli, Stefano, Scialanga, Marco, Semenov, Andrei, Shridhar, Kumar, Singhal, Raghav, Sotnikova, Anna, Sternfeld, Alexander, Tarun, Ayush Kumar, Teiletche, Paul, Vamvas, Jannis, Yao, Xiaozhe, Zhao, Hao, Ilic, Alexander, Klimovic, Ana, Krause, Andreas, Gulcehre, Caglar, Rosenthal, David, Ash, Elliott, Tramèr, Florian, VandeVondele, Joost, Veraldi, Livio, Rajman, Martin, Schulthess, Thomas, Hoefler, Torsten, Bosselut, Antoine, Jaggi, Martin, Schlag, Imanol
We present Apertus, a fully open suite of large language models (LLMs) designed to address two systemic shortcomings in today's open model ecosystem: data compliance and multilingual representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting `robots.txt` exclusions and filtering for non-permissive, toxic, and personally identifiable content. To mitigate risks of memorization, we adopt the Goldfish objective during pretraining, strongly suppressing verbatim recall of data while retaining downstream task performance. The Apertus models also expand multilingual coverage, training on 15T tokens from over 1800 languages, with ~40% of pretraining data allocated to non-English content. Released at 8B and 70B scales, Apertus approaches state-of-the-art results among fully open models on multilingual benchmarks, rivalling or surpassing open-weight counterparts. Beyond model weights, we release all scientific artifacts from our development cycle with a permissive license, including data preparation scripts, checkpoints, evaluation suites, and training code, enabling transparent audit and extension.
ChatGPT still stereotypes responses based on your name, but less often
OpenAI, the company behind ChatGPT, just released a new research report that examined whether the AI chatbot discriminates against users or stereotypes its responses based on users' names. The company used its own AI model GPT-4o to go through large amounts of ChatGPT conversations and analyze whether the chatbot's responses contained "harmful stereotypes" based on who it was conversing with. The results were then double-checked by human reviewers. The screenshots above are examples from legacy AI models to illustrate ChatGPT's responses that were examined by the study. In both cases, the only variable that differs is the users' names.
First-Person Fairness in Chatbots
Eloundou, Tyna, Beutel, Alex, Robinson, David G., Gu-Lemberg, Keren, Brakman, Anna-Luisa, Mishkin, Pamela, Shah, Meghan, Heidecke, Johannes, Weng, Lilian, Kalai, Adam Tauman
Chatbots like ChatGPT are used for diverse purposes, ranging from resume writing to entertainment. These real-world applications are different from the institutional uses, such as resume screening or credit scoring, which have been the focus of much of AI research on fairness. Ensuring equitable treatment for all users in these first-person contexts is critical. In this work, we study "first-person fairness," which means fairness toward the chatbot user. This includes providing high-quality responses to all users regardless of their identity or background and avoiding harmful stereotypes. We propose a scalable, privacy-preserving method for evaluating one aspect of first-person fairness across a large, heterogeneous corpus of real-world chatbot interactions. Specifically, we assess potential bias linked to users' names, which can serve as proxies for demographic attributes like gender or race, in chatbot systems such as ChatGPT, which provide mechanisms for storing and using user names. Our method leverages a second language model to privately analyze name-sensitivity in the chatbot's responses. We verify the validity of these annotations through independent human evaluation. Further, we show that post-training interventions, including RL, significantly mitigate harmful stereotypes. Our approach also yields succinct descriptions of response differences across tasks. For instance, in the "writing a story" task, chatbot responses show a tendency to create protagonists whose gender matches the likely gender inferred from the user's name. Moreover, a pattern emerges where users with female-associated names receive responses with friendlier and simpler language slightly more often than users with male-associated names. Finally, we provide the system messages required for external researchers to further investigate ChatGPT's behavior with hypothetical user profiles.
How AI-Powered Tech Can Harm Children
A new study from University of Washington and Johns Hopkins shows that robots trained on artificial intelligence make decisions imbued with racism and sexism. Of course, robots are only the latest in a long line of new technologies found to perpetuate harmful stereotypes--so do search engines, social media, and video games, as well as other popular tech products trained on huge sets of data and driven by algorithms. That devices feed racist and sexist misinformation to adults is terrible enough. But, as a psychologist and advocate for kids, I worry even more about what's being fed to children, including the very young, who are also exposed to--and influenced by--tech-delivered misinformation about race. The study comes out at a time when, across the U.S., a wave of new legislation is censoring what educators can discuss in the classroom, including topics of race, slavery, gender identity, and politics.
How AI-Powered Tech Can Harm Children
A new study from University of Washington and Johns Hopkins shows that robots trained on artificial intelligence make decisions imbued with racism and sexism. Of course, robots are only the latest in a long line of new technologies found to perpetuate harmful stereotypes--so do search engines, social media, and video games, as well as other popular tech products trained on huge sets of data and driven by algorithms. That devices feed racist and sexist misinformation to adults is terrible enough. But, as a psychologist and advocate for kids, I worry even more about what's being fed to children, including the very young, who are also exposed to--and influenced by--tech-delivered misinformation about race. The study comes out at a time when, across the U.S., a wave of new legislation is censoring what educators can discuss in the classroom, including topics of race, slavery, gender identity, and politics.
Gender and Racial Bias in Visual Question Answering Datasets
Vision-and-language tasks have increasingly drawn more attention as a means to evaluate human-like reasoning in machine learning models. A popular task in the field is visual question answering (VQA), which aims to answer questions about images. However, VQA models have been shown to exploit language bias by learning the statistical correlations between questions and answers without looking into the image content: e.g., questions about the color of a banana are answered with yellow, even if the banana in the image is green. If societal bias (e.g., sexism, racism, ableism, etc.) is present in the training data, this problem may be causing VQA models to learn harmful stereotypes. For this reason, we investigate gender and racial bias in five VQA datasets.
Ethical and social risks of harm from Language Models
Weidinger, Laura, Mellor, John, Rauh, Maribeth, Griffin, Conor, Uesato, Jonathan, Huang, Po-Sen, Cheng, Myra, Glaese, Mia, Balle, Borja, Kasirzadeh, Atoosa, Kenton, Zac, Brown, Sasha, Hawkins, Will, Stepleton, Tom, Biles, Courtney, Birhane, Abeba, Haas, Julia, Rimell, Laura, Hendricks, Lisa Anne, Isaac, William, Legassick, Sean, Irving, Geoffrey, Gabriel, Iason
This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences. We outline six specific risk areas: I. Discrimination, Exclusion and Toxicity, II. Information Hazards, III. Misinformation Harms, V. Malicious Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and Environmental Harms. The first area concerns the perpetuation of stereotypes, unfair discrimination, exclusionary norms, toxic language, and lower performance by social group for LMs. The second focuses on risks from private data leaks or LMs correctly inferring sensitive information. The third addresses risks arising from poor, false or misleading information including in sensitive domains, and knock-on risks such as the erosion of trust in shared information. The fourth considers risks from actors who try to use LMs to cause harm. The fifth focuses on risks specific to LLMs used to underpin conversational agents that interact with human users, including unsafe use, manipulation or deception. The sixth discusses the risk of environmental harm, job automation, and other challenges that may have a disparate effect on different social groups or communities. In total, we review 21 risks in-depth. We discuss the points of origin of different risks and point to potential mitigation approaches. Lastly, we discuss organisational responsibilities in implementing mitigations, and the role of collaboration and participation. We highlight directions for further research, particularly on expanding the toolkit for assessing and evaluating the outlined risks in LMs.
How Deep Learning Is Transforming Marketing
The marketing industry currently finds itself at a crossroads. On the one hand, you have an industry that prides itself on its creativity and the ability to come up with surprising and innovative ways to market products. On the other hand, whether you realize it or not, it is an industry that is increasingly technology-driven, relying on the latest in artificial intelligence and deep learning to reach consumers as many times and in as many different ways as possible. Deep learning has already changed how marketing operates in ways both obvious and subtle and has transformed how brands interact with consumers as well as how consumers relate to brands. It has allowed brands (or their algorithms) to gain a more complete understanding of how their customers think, react and purchase, while also allowing for the complete overhaul of internal organizational structures.